Skip to content

Support reporting statistics in spark datasource#8057

Open
robert3005 wants to merge 3 commits into
developfrom
rk/sparkstats
Open

Support reporting statistics in spark datasource#8057
robert3005 wants to merge 3 commits into
developfrom
rk/sparkstats

Conversation

@robert3005

Copy link
Copy Markdown
Contributor

Spark mostly focuses on sizeInBytes which we populate from file sizes with
scaling. We also report numRows since that exists in our datasource.

@codspeed-hq

codspeed-hq Bot commented May 22, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 29.37%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 3 improved benchmarks
✅ 1523 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 46.7 µs 31.8 µs +46.95%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 197.9 µs 161.8 µs +22.34%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 213.6 µs 177.3 µs +20.42%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing rk/sparkstats (deb1dc0) with develop (f7f6d10)

Open in CodSpeed

@robert3005 robert3005 force-pushed the rk/sparkstats branch 3 times, most recently from e97797d to 4d9b080 Compare May 28, 2026 01:02
@robert3005 robert3005 added the changelog/feature A new feature label May 28, 2026
Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Robert Kruszewski <github@robertk.io>
@robert3005 robert3005 requested a review from a team June 9, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant